Parallel-Wiki: A Collection of Parallel Sentences Extracted from Wikipedia
نویسندگان
چکیده
منابع مشابه
Parallel-Wiki: A Collection of Parallel Sentences Extracted from Wikipedia
Parallel corpora are essential resources for certain Natural Language Processing tasks such as Statistical Machine Translation. However, the existing publically available parallel corpora are specific to limited genres or domains, mostly juridical (e.g. JRC-Acquis) and medical (e.g. EMEA), and there is a lack of such resources for the general domain. This paper addresses this issue and presents...
متن کاملGrouping Synonymous Sentences from a Parallel Corpus
Abstract Recently, natural language processing researches have focused on data or processing techniques for paraphrasing. Unfortunately, however, we have little data for paraphrasing. There are some research reports on collecting synonymous expressions with parallel corpus, though no suitable corpus for collecting a set of paraphrases is yet available. Therefore, we obtain a few variations of e...
متن کاملIdentifying Parallel Documents from a Large Bilingual Collection of Texts: Application to Parallel Article Extraction in Wikipedia
While several recent works on dealing with large bilingual collections of texts, e.g. (Smith et al., 2010), seek for extracting parallel sentences from comparable corpora, we present PARADOCS, a system designed to recognize pairs of parallel documents in a (large) bilingual collection of texts. We show that this system outperforms a fair baseline (Enright and Kondrak, 2007) in a number of contr...
متن کاملAutomatically Extracting Parallel Sentences from Wikipedia Using Sequential Matching of Language Resources
In this paper, we propose a method to find similar sentences based on language resources for building a parallel corpus between English and Korean from Wikipedia. We use a Wiki-dictionary consisted of document titles from the Wikipedia and bilingual example sentence pairs from Web dictionary instead of traditional machine readable dictionary. In this way, we perform similarity calculation betwe...
متن کاملParallel Sentences Mining From The Web
Parallel sentences can benefit many NLP applications (e.g., machine translation, cross language information retrieval.) In this paper, the candidate bilingual webs pages are returned by submit sentence pairs to search engine and then validated by surface patterns. We propose an algorithm to candidate bilingual resource extraction and filter useless bilingual web pages. The pair sentences includ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Research in Computing Science
سال: 2013
ISSN: 1870-4069
DOI: 10.13053/rcs-70-1-11